This workshop will show you how to load, manipulate, visualize, and analyze spatial data in R. Our end application will be species distribution models, but you can use similar techniques for other kinds of spatial analysis and modeling.
Our example today will be species distribution models for aspen (Populus tremuloides) in Utah. See Elith et al. 2009 for a broader overview on species distribution modeling.
This workshop was developed in R Markdown. As a user of this workshop, this means that if you are working from a PDF version of this report, you are seeing the same content that is in the script. As a future user of R Markdown, this means that you can write code, make figures, and annotate your findings all within one document– that you can then export for multiple uses. Please take the opportunity to explore both the PDF and .Rmd versions of this workshop!
This workshop uses the here package to create dynamic references to file paths. This means you never have to use setwd() again! By creating an R project in a folder of your choice, here() identifies the source or home directory for the project on your machine, so that you don’t have to manually set the location of the files or scripts.
If you download or clone the entire github repository, you will have all the data you need to run this model. The repository also already includes prepared data, but this workflow takes you through all the steps you would use to load, prep, and use the data on your own.
Over the course of this workshop, you will learn how to:
Additionally, we will learn some foundational concepts behind species distribution modeling.
First, we need to install all of the packages we need for this project.
# LOAD LIBRARIES
#install.packages("here)
library(here)
here() starts at /Users/lilaleatherman/Documents/github/sdm_tutorial_utah
#install.packages("tidyverse")
library(tidyverse)
[37m── [1mAttaching packages[22m ────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[37m[32m✔[37m [34mggplot2[37m 3.1.0 [32m✔[37m [34mpurrr [37m 0.3.2
[32m✔[37m [34mtibble [37m 2.1.1 [32m✔[37m [34mdplyr [37m 0.8.0.[31m1[37m
[32m✔[37m [34mtidyr [37m 0.8.3 [32m✔[37m [34mstringr[37m 1.4.0
[32m✔[37m [34mreadr [37m 1.1.1 [32m✔[37m [34mforcats[37m 0.3.0 [39m
package ‘tibble’ was built under R version 3.5.2package ‘tidyr’ was built under R version 3.5.2package ‘purrr’ was built under R version 3.5.2package ‘dplyr’ was built under R version 3.5.2package ‘stringr’ was built under R version 3.5.2[37m── [1mConflicts[22m ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31m✖[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()[39m
#install.packages("rgeos")
library(rgeos)
First, we’re going to load some background spatial data for our project.
First, we’ll get a polygon representing the state of Utah. There are several different packages that can help you acquire administrative boundaries, but here we’re using getData() from the raster package.
#a way to get state data
states_list <- c('Utah')
states_all <- getData("GADM",country="USA",level=1, path = here("data/UT"))
UT.shp <- states_all[states_all$NAME_1 %in% states_list,]
#inspect
#UT.shp
plot(UT.shp)
This loads and plots a shapefile for Utah. If you downloaded the whole directory, this file has already been saved and provided for you, but you can save it again for practice.
#create directory for output
dir.create(here("data/UT"))
#export
writeOGR(UT.shp, here("data/UT"), layer = "UT", driver = "ESRI Shapefile", overwrite = TRUE)
#load back in
UT.shp <- shapefile(here("data/UT/UT.shp"))
Next, we’re going to load in our climate data. These data are commonly used for species distribution modeling and represent environmental variables that combine different climatic variables into variables that are more meaningful for species. These data were downloaded from http://worldclim.org/version2 at 30s resolution and cropped to Utah. These data are in Raster stack format - a collection of raster layers. (What is this analagous to in Arc?)
Here, I’m using the stack(), commented out below, can also be used to read in a raster stack, a .grd file. Alternately, the command readRDS can also be used to load the data that I have saved in .Rdata format, which is specific to R.
envStack <- stack(here("data/climate/envStack_init.grd"))
#envStack <- readRDS(here("data/climate/envStack_init.RData"))
#inspect
plot(envStack)
We have environmental data, but we need to crop them only to the spatial extent that we’re interested in: in this case, Utah.
#crop to Utah extent
envStack_UT <- crop(envStack, UT.shp)
#inspect - this just crops to the spatial extent of the object, not the outline of the polygon
plot(envStack_UT)
# crop to Utah boundaries
envStack_UT <- mask(envStack, UT.shp)
plot(envStack_UT)
#export
writeRaster(envStack_UT, file = here("data/climate/envStack_UT.grd"), options = "INTERLEAVE=BAND", overwrite=TRUE)
saveRDS(envStack_UT, file = here("data/climate/envStack_UT.RData"))
Looks good! We can also export our final raster stack. In this case, we can write the file as a raster, or save the data as a .Rdata file. Be careful which file format you save a raster stack in– even though some file formats can be used for both single and multiple layer raster data (e.g., .tif), these formats do not preserve the names of the layers in a raster stack.
Next, we’ll load occurrence data for our species of interest from a few different sources. First, we download data from the Global Biodiversity Information Facility (GBIF) for our species of interest.
#function also doesn't work if the gbif website is down
gbif.POTR <- occ_search(scientificName = "Populus tremuloides",
return = "data",
hasCoordinate = TRUE,
hasGeospatialIssue = FALSE,
limit = 200000,
country = "US", stateProvince = c("Utah"),
fields = c("name", "decimalLongitude", "decimalLatitude"))
colnames(gbif.POTR) <- c("name", "lon", "lat")
#export
dir.create(path = here("/data/occurrence/GBIF"))
'/Users/lilaleatherman/Documents/github/sdm_tutorial_utah//data/occurrence/GBIF' already exists
write.csv(gbif.POTR, "./data/occurrence/GBIF/gbif.POTR.csv", row.names = FALSE)
#read back in
gbif.POTR <- read.csv("./data/occurrence/GBIF/gbif.POTR.csv", stringsAsFactors = FALSE)
###convert to shapefile
#create object to turn into a shapefile
gbif.POTR.shp <- gbif.POTR
#set the fields that represent the coordinates
coordinates(gbif.POTR.shp) <- ~ lon + lat
#convert to spatial points daa frame
gbif.POTR.shp <- SpatialPointsDataFrame(coords = coordinates(gbif.POTR.shp),
data = data.frame(gbif.POTR.shp))
# Same as "define projection" in Arc
proj4string(gbif.POTR.shp) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
#check that CRS is the same as enviro data
identicalCRS(gbif.POTR.shp, envStack)
[1] TRUE
#inspect - we can just use plot() to plot the spatial data
plot(gbif.POTR.shp)
And we have points!
Unfortunately, looks like we have some extraneous points that we didn’t expect here, so we’ll need to crop them out.
#make sure we only have points from UT
#basically, subsetting the points to the ones that fall have to do this spatial operation on a spatial object
gbif.POTR.shp <- gbif.POTR.shp[UT.shp, ]
plot(gbif.POTR.shp)
#replace file with the subsetted points in our area of interest
#we can access just the data frame portion of the shapefile
gbif.POTR <- gbif.POTR.shp@data %>%
dplyr::select(-optional)
#export again so we have the most up-to-date version saved! both for our collaborators, and FUTURE US
write.csv(gbif.POTR, "./data/occurrence/GBIF/gbif.POTR.csv", row.names = FALSE)
Let’s plot these on a map!
#to plot points, run this whole chunk
maps::map(database = "state", regions = "utah")
points(gbif.POTR.shp)
I downloaded and processed these data from the Forest Service website, which you can access here: https://apps.fs.usda.gov/fia/datamart/ .
# Normally, you will just load one version of the data to manipulate. But here, I'm showing you a coupel different ways to load in these data.
#load shapefile
fia.POTR.shp <- shapefile(here("data/occurrence/FIA/fia.POTR.shp"))
#load .csv
fia.POTR <- read.csv(here("data/occurrence/FIA/FIA_POTR_UT_presAbs.csv"))
Let’s look at these data:
# plot with base R, again needs to be a shapefile to plot like this
# color by presence / absence recorded
plot(UT.shp)
points(fia.POTR.shp, pch = 21, bg = "white", cex = 0.5)
points(fia.POTR.shp[fia.POTR$presAbs == 1,], pch = 21, bg = "dodgerblue")
The FIA data include points both where POTR was observed, and where it was not observed.
# join the data
# Make the fields align!
#The GBIF data only represent places where POTR was observed, but we need to add a field that indicates this.
#Also, the FIA data do not have a "name" column, so we remove it here
gbif.POTR <- gbif.POTR %>%
mutate(presAbs = 1) %>%
select(-name)
# bind the data frames together
POTR.dat <- bind_rows(fia.POTR, gbif.POTR) %>%
select(lon, lat, presAbs) # make sure we get only the fields present in both datasets
# #### alternately, you can join two shapefiles like so:
# # still have to create presAbs field for the gbif data
# gbif.POTR.shp@data <- data.frame(gbif.POTR.shp@data[c("lon", "lat")],
# presAbs = rep(1, nrow(gbif.POTR.shp)))
#
# #check that the projections are the same for these data sets
# identicalCRS(gbif.POTR.shp, fia.POTR.shp)
#
# #bind
# POTR.dat.shp <- rbind(gbif.POTR.shp, fia.POTR.shp)
# plot all together
# ggplot can be used to plot
# another way to get spatial data - ready for plotting in ggplot.
# this is just a list of points that can be rendered as a polygon by ggplot.
# I like ggplot, which is part of the "tidyverse", because I think it's a little easier to understand than base R!
UT <- map_data("state", region = "utah")
ggplot() +
geom_polygon(data = UT, aes(x=long, y = lat, group = group), fill = NA, color = "black") +
geom_point(data = POTR.dat %>% arrange(presAbs),
aes(x = lon, y = lat, color = factor(presAbs))) +
coord_fixed(1.3) +
labs(title = "FIA and GBIF PRESENCE/ABSENCE DATA - Populus tremuloides")
#export your data!
write.csv(POTR.dat, here("data/occurrence/full_POTR.csv"), row.names = FALSE )
Another task you might want to do is to only look at one raster, within the extent of another raster. We won’t be using this today, but I have provided this as an example. We have a layer that represents areas of forest in Utah, which was downloaded and prepped from : https://swregap.org/data/landcover/
#load mask layer
forest_mask <- raster(here("data/ut_landcover/ut_forestmask.tif"))
#inspect
plot(forest_mask)
#make sure mask and layer to be masked have same extent and projection (and resolution?) - otherwise, it won't work!
compareRaster(forest_mask, envStack_UT)
[1] TRUE
#perform the operation
envStack_mask <- mask(x = envStack_UT, mask = forest_mask, maskvalue = 0)
#inspect
plot(envStack_mask)
So now, if we wanted it– we only have environmental data for areas where there is forest in Utah.
For species distribution modeling, we need to create a data frame that has the values for environmental variables at each of our points. We can do an extract operation to get this information. We do this using the extract() function in the raster package. You can extract either by specifying the coordinates, or by using the shapefile
## extract enviro data to points
#by specifying coordinates
env.dat <- raster::extract(x = envStack_UT, y = POTR.dat[,c("lon", "lat")])
head(env.dat)
Annual.Mean.Temperature Mean.Diurnal.Range Isothermality Temperature.Seasonality
[1,] 9.487500 14.79167 37.44726 887.3636
[2,] 10.350000 11.41667 31.62512 904.1747
[3,] 9.054167 14.00833 34.33415 952.7078
[4,] 8.079166 13.24167 36.17942 849.0461
[5,] 6.712500 16.45833 43.08464 795.2562
[6,] 9.108334 15.66667 34.89235 1031.5254
Max.Temperature.of.Warmest.Month Min.Temperature.of.Coldest.Month Temperature.Annual.Range
[1,] 28.7 -10.8 39.5
[2,] 28.3 -7.8 36.1
[3,] 28.7 -12.1 40.8
[4,] 26.4 -10.2 36.6
[5,] 25.6 -12.6 38.2
[6,] 30.2 -14.7 44.9
Mean.Temperature.of.Wettest.Quarter Mean.Temperature.of.Driest.Quarter
[1,] 8.233333 20.750000
[2,] 21.183334 14.083333
[3,] 20.133333 -1.633333
[4,] 18.383333 11.383333
[5,] 16.283333 9.916667
[6,] 16.616667 -3.950000
Mean.Temperature.of.Warmest.Quarter Mean.Temperature.of.Coldest.Quarter Annual.Precipitation
[1,] 20.75000 -1.1666666 309
[2,] 21.90000 -0.2500001 287
[3,] 20.96667 -2.6333334 212
[4,] 18.88333 -1.8500000 397
[5,] 16.86667 -2.6666667 316
[6,] 21.50000 -3.9499998 273
Precipitation.of.Wettest.Month Precipitation.of.Driest.Month Precipitation.Seasonality
[1,] 34 16 19.20286
[2,] 35 11 28.65912
[3,] 27 12 32.08870
[4,] 45 17 25.09788
[5,] 49 16 34.61431
[6,] 30 16 21.77701
Precipitation.of.Wettest.Quarter Precipitation.of.Driest.Quarter Precipitation.of.Warmest.Quarter
[1,] 98 63 63
[2,] 97 44 78
[3,] 79 39 67
[4,] 129 72 105
[5,] 115 53 96
[6,] 85 49 71
Precipitation.of.Coldest.Quarter
[1,] 68
[2,] 69
[3,] 39
[4,] 86
[5,] 65
[6,] 49
str(env.dat)
num [1:3861, 1:19] 9.49 10.35 9.05 8.08 6.71 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:19] "Annual.Mean.Temperature" "Mean.Diurnal.Range" "Isothermality" "Temperature.Seasonality" ...
# # by using the shapefile which is already spatial
# env.dat <- raster::extract(x = envStack_UT, y = POTR.dat.shp)
# head(env.dat)
plot(env.dat[,2] ~ factor(POTR.dat$presAbs))
Unlike the previous steps we’ve completed, this step is more exclusive to species distribution modeling. You can run the following chunks of code, which are specific to the species distribution modeling process
POTR.mod.dat <- BIOMOD_FormatingData(resp.var = as.numeric(POTR.dat$presAbs),
resp.xy = POTR.dat[, c("lon", "lat")],
#resp.var = POTR.dat.shp, # for input: can use shapefile with the presence-absence response in the @data slot
expl.var = stack(envStack_UT),
#eval.resp.var = ,
#PA.strategy = "random",
#PA.nb.rep = 0,
#PA.nb.absences = 0,
resp.name = "Populus.tremuloides")
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Populus.tremuloides Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> No pseudo absences selection !
! No data has been set aside for modeling evaluation
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
POTR.mod.dat
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 'BIOMOD.formated.data' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
sp.name = Populus.tremuloides
530 presences, 3331 true absences and 0 undifined points in dataset
19 explanatory variables
Annual.Mean.Temperature Mean.Diurnal.Range Isothermality Temperature.Seasonality
Min. :-1.133 Min. : 8.258 Min. :28.58 Min. : 674.5
1st Qu.: 4.292 1st Qu.:12.950 1st Qu.:36.24 1st Qu.: 793.8
Median : 6.562 Median :14.283 Median :38.01 Median : 835.2
Mean : 6.516 Mean :14.183 Mean :38.06 Mean : 843.5
3rd Qu.: 8.783 3rd Qu.:15.417 3rd Qu.:39.91 3rd Qu.: 889.7
Max. :15.442 Max. :19.967 Max. :45.62 Max. :1156.1
Max.Temperature.of.Warmest.Month Min.Temperature.of.Coldest.Month Temperature.Annual.Range
Min. :14.50 Min. :-19.00 Min. :27.30
1st Qu.:22.00 1st Qu.:-13.40 1st Qu.:34.80
Median :25.30 Median :-12.20 Median :37.60
Mean :25.13 Mean :-12.09 Mean :37.22
3rd Qu.:28.30 3rd Qu.:-10.80 3rd Qu.:39.60
Max. :34.90 Max. : -3.10 Max. :50.70
Mean.Temperature.of.Wettest.Quarter Mean.Temperature.of.Driest.Quarter Mean.Temperature.of.Warmest.Quarter
Min. :-6.283 Min. :-7.550 Min. : 8.267
1st Qu.: 6.050 1st Qu.:-1.467 1st Qu.:14.533
Median :12.867 Median : 7.350 Median :17.217
Mean :11.103 Mean : 6.798 Mean :17.224
3rd Qu.:16.650 3rd Qu.:13.917 3rd Qu.:19.950
Max. :24.050 Max. :23.267 Max. :26.717
Mean.Temperature.of.Coldest.Quarter Annual.Precipitation Precipitation.of.Wettest.Month
Min. :-8.917 Min. :155.0 Min. :21.00
1st Qu.:-5.150 1st Qu.:312.0 1st Qu.:36.00
Median :-3.600 Median :374.0 Median :42.00
Mean :-3.390 Mean :372.2 Mean :42.47
3rd Qu.:-1.700 3rd Qu.:431.0 3rd Qu.:48.00
Max. : 5.350 Max. :639.0 Max. :73.00
Precipitation.of.Driest.Month Precipitation.Seasonality Precipitation.of.Wettest.Quarter
Min. : 5.00 Min. : 6.004 Min. : 58.0
1st Qu.:14.00 1st Qu.:14.976 1st Qu.: 99.0
Median :20.00 Median :20.832 Median :114.0
Mean :20.26 Mean :21.254 Mean :115.4
3rd Qu.:25.00 3rd Qu.:26.762 3rd Qu.:131.0
Max. :40.00 Max. :40.002 Max. :202.0
Precipitation.of.Driest.Quarter Precipitation.of.Warmest.Quarter Precipitation.of.Coldest.Quarter
Min. : 23.00 Min. : 41.00 Min. : 23.00
1st Qu.: 57.00 1st Qu.: 79.00 1st Qu.: 66.00
Median : 72.00 Median : 91.00 Median : 87.00
Mean : 71.33 Mean : 94.84 Mean : 88.26
3rd Qu.: 86.00 3rd Qu.:107.00 3rd Qu.:110.00
Max. :124.00 Max. :202.00 Max. :177.00
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
BIOMOD_ModelingOptions() # need to install java in order to run Maxent.Phillips; we won't be doing this today because it can be pretty finicky!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 'BIOMOD.Model.Options' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
GLM = list( type = 'quadratic',
interaction.level = 0,
myFormula = NULL,
test = 'AIC',
family = binomial(link = 'logit'),
mustart = 0.5,
control = glm.control(epsilon = 1e-08, maxit = 50, trace = FALSE) ),
GBM = list( distribution = 'bernoulli',
n.trees = 2500,
interaction.depth = 7,
n.minobsinnode = 5,
shrinkage = 0.001,
bag.fraction = 0.5,
train.fraction = 1,
cv.folds = 3,
keep.data = FALSE,
verbose = FALSE,
perf.method = 'cv',
n.cores = 1),
GAM = list( algo = 'GAM_mgcv',
type = 's_smoother',
k = -1,
interaction.level = 0,
myFormula = NULL,
family = binomial(link = 'logit'),
method = 'GCV.Cp',
optimizer = c('outer','newton'),
select = FALSE,
knots = NULL,
paraPen = NULL,
control = list(nthreads = 1, irls.reg = 0, epsilon = 1e-07, maxit = 200, trace = FALSE
, mgcv.tol = 1e-07, mgcv.half = 15, rank.tol = 1.49011611938477e-08
, nlm = list(ndigit=7, gradtol=1e-06, stepmax=2, steptol=1e-04, iterlim=200, check.analyticals=0)
, optim = list(factr=1e+07), newton = list(conv.tol=1e-06, maxNstep=5, maxSstep=2, maxHalf=30, use.svd=0)
, outerPIsteps = 0, idLinksBases = TRUE, scalePenalty = TRUE, efs.lspmax = 15, efs.tol = 0.1, keepData = FALSE
, scale.est = fletcher, edge.correct = FALSE) ),
CTA = list( method = 'class',
parms = 'default',
cost = NULL,
control = list(xval = 5, minbucket = 5, minsplit = 5, cp = 0.001, maxdepth = 25) ),
ANN = list( NbCV = 5,
size = NULL,
decay = NULL,
rang = 0.1,
maxit = 200),
SRE = list( quant = 0.025),
FDA = list( method = 'mars',
add_args = NULL),
MARS = list( type = 'simple',
interaction.level = 0,
myFormula = NULL,
nk = NULL,
penalty = 2,
thresh = 0.001,
nprune = NULL,
pmethod = 'backward'),
RF = list( do.classif = TRUE,
ntree = 500,
mtry = 'default',
nodesize = 5,
maxnodes = NULL),
MAXENT.Phillips = list( path_to_maxent.jar = '/Users/lilaleatherman/Documents/github/sdm_tutorial_utah',
memory_allocated = 512,
background_data_dir = 'default',
maximumbackground = 'default',
maximumiterations = 200,
visible = FALSE,
linear = TRUE,
quadratic = TRUE,
product = TRUE,
threshold = TRUE,
hinge = TRUE,
lq2lqptthreshold = 80,
l2lqthreshold = 10,
hingethreshold = 15,
beta_threshold = -1,
beta_categorical = -1,
beta_lqp = -1,
beta_hinge = -1,
betamultiplier = 1,
defaultprevalence = 0.5),
MAXENT.Tsuruoka = list( l1_regularizer = 0,
l2_regularizer = 0,
use_sgd = FALSE,
set_heldout = 0,
verbose = FALSE)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#myBiomodOptions <- BIOMOD_ModelingOptions(MAXENT.Phillips = list(path_to_maxent.jar = "maxent/maxent.jar"))
POTR.mod <- BIOMOD_Modeling(data = POTR.mod.dat,
#models = c('GLM','GAM','ANN','RF','MAXENT.Tsuruoka'),
models = c('GBM','ANN','RF'),
#SaveObj = TRUE,
#models.options = myBiomodOptions,
# , DataSplit = 80
VarImport = 1)
Loading required library...
Checking Models arguments...
Models will run with 'defaults' parameters
Creating suitable Workdir...
> No weights : all observations will have the same weight
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Populus.tremuloides Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
19 environmental variables ( Annual.Mean.Temperature Mean.Diurnal.Range Isothermality Temperature.Seasonality Max.Temperature.of.Warmest.Month Min.Temperature.of.Coldest.Month Temperature.Annual.Range Mean.Temperature.of.Wettest.Quarter Mean.Temperature.of.Driest.Quarter Mean.Temperature.of.Warmest.Quarter Mean.Temperature.of.Coldest.Quarter Annual.Precipitation Precipitation.of.Wettest.Month Precipitation.of.Driest.Month Precipitation.Seasonality Precipitation.of.Wettest.Quarter Precipitation.of.Driest.Quarter Precipitation.of.Warmest.Quarter Precipitation.of.Coldest.Quarter )
Number of evaluation repetitions : 1
Models selected : GBM ANN RF
Total number of model runs : 3
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=- Run : Populus.tremuloides_AllData
-=-=-=--=-=-=- Populus.tremuloides_AllData_Full
Model=Generalised Boosting Regression
2500 maximum different trees and 3 Fold Cross-Validation
Evaluating Model stuff...
Evaluating Predictor Contributions...
Model=Artificial Neural Network
5 Fold Cross Validation + 3 Repetitions
Model scaling...
Evaluating Model stuff...
Evaluating Predictor Contributions...
Model=Breiman and Cutler's random forests for classification and regression
Evaluating Model stuff...
Evaluating Predictor Contributions...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
POTR.mod
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.out -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Modeling id : 1555369258
Species modeled : Populus.tremuloides
Considered variables : Annual.Mean.Temperature Mean.Diurnal.Range Isothermality Temperature.Seasonality
Max.Temperature.of.Warmest.Month Min.Temperature.of.Coldest.Month Temperature.Annual.Range
Mean.Temperature.of.Wettest.Quarter Mean.Temperature.of.Driest.Quarter Mean.Temperature.of.Warmest.Quarter
Mean.Temperature.of.Coldest.Quarter Annual.Precipitation Precipitation.of.Wettest.Month
Precipitation.of.Driest.Month Precipitation.Seasonality Precipitation.of.Wettest.Quarter
Precipitation.of.Driest.Quarter Precipitation.of.Warmest.Quarter Precipitation.of.Coldest.Quarter
Computed Models : Populus.tremuloides_AllData_Full_GBM Populus.tremuloides_AllData_Full_ANN
Populus.tremuloides_AllData_Full_RF
Failed Models : none
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
So now we’ve built a suite of species distribution models! This graph is one method of evaluating the boosted regression tree model, abbreviated in the model formulation as “GBM,” or “generalized boosted model.” We can plot and visualize these in a couple different ways.
Below I use tidyverse techniques to manipulate and plot these evaluations in a more attractive way, using the ggplot package.
POTR.mod.eval <- get_evaluations(POTR.mod)
dimnames(POTR.mod.eval)
[[1]]
[1] "KAPPA" "TSS" "ROC"
[[2]]
[1] "Testing.data" "Cutoff" "Sensitivity" "Specificity"
[[3]]
[1] "GBM" "ANN" "RF"
[[4]]
[1] "Full"
[[5]]
Populus.tremuloides_AllData
"AllData"
POTR.mod.eval["TSS","Testing.data",,,]
GBM ANN RF
0.637 0.615 0.869
POTR.mod.eval["KAPPA","Testing.data",,,]
GBM ANN RF
0.432 0.397 0.649
POTR.mod.eval["ROC","Testing.data",,,]
GBM ANN RF
0.880 0.857 0.947
# prep and plot these data a little more attractively
POTR.mod.eval <-
data.frame(POTR.mod.eval[1:3, 1:4, ,,]) %>%
rownames_to_column("metric") %>%
gather(-metric, key = "mod", value = "val") %>%
mutate(mod = gsub(pattern = "Testing.data", "Testingdata", mod)) %>%
separate(mod, c("val_type", "model"))
#inspect evaluation statistics
print(
POTR.mod.eval %>%
filter(val_type == "Testingdata") %>%
ggplot(aes(x = metric, y = val, fill = model)) +
geom_bar(stat = "identity", position = "dodge") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "model evaluation - on testing data",
x = "model type")
)
NA
So, this has reorganized the data so that we are now plotting model evaluation statistics for each model type, side by side.
We can do the same thing to look at the importance of different variables in the model. Here, this means how much each environmental variable contributes to the model.
get_variables_importance(POTR.mod)
, , Full, AllData
GBM ANN RF
Annual.Mean.Temperature 0.448 0.496 0.126
Mean.Diurnal.Range 0.001 0.030 0.011
Isothermality 0.002 0.114 0.010
Temperature.Seasonality 0.004 0.305 0.012
Max.Temperature.of.Warmest.Month 0.030 0.101 0.066
Min.Temperature.of.Coldest.Month 0.001 0.009 0.009
Temperature.Annual.Range 0.001 0.051 0.021
Mean.Temperature.of.Wettest.Quarter 0.004 0.114 0.015
Mean.Temperature.of.Driest.Quarter 0.006 0.189 0.016
Mean.Temperature.of.Warmest.Quarter 0.008 0.074 0.076
Mean.Temperature.of.Coldest.Quarter 0.016 0.242 0.052
Annual.Precipitation 0.011 0.289 0.026
Precipitation.of.Wettest.Month 0.000 0.263 0.005
Precipitation.of.Driest.Month 0.002 0.336 0.014
Precipitation.Seasonality 0.005 0.197 0.011
Precipitation.of.Wettest.Quarter 0.002 0.298 0.011
Precipitation.of.Driest.Quarter 0.002 0.233 0.022
Precipitation.of.Warmest.Quarter 0.008 0.184 0.017
Precipitation.of.Coldest.Quarter 0.009 0.152 0.014
We can also prep and plot this a little more attractively.
# investigate variable importance
# still some parsing to do here to visualize
var.imp <- (get_variables_importance(POTR.mod))
var.imp <- data.frame(var.imp)
# var.imp$var <- names(envStack_aoi)
# var.imp
var.imp <-
var.imp %>%
mutate(var = names(envStack_UT))%>%
gather(-var, key = "mod", value = "val") %>%
separate(mod, c("model_type", "model_level", "data_amt"))
#plot var importance
print(
var.imp %>%
#mutate(val = ifelse(model_type == "RF", val*10, val)) %>% #multiply RF values by 10 to compare better? idk why these are so low
ggplot(aes(y = val, x = reorder(var, val), fill = val, group = interaction(data_amt, model_type))) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "Variable importance, by model",
x = "variable") +
facet_grid(~model_type)
)
Lastly in our species distribution modeling process, we create an ensemble model that averages all the models together. Here, this means we make a united map or raster of our results. First we’ll build the model, then we’ll project it.
myBiomodEM <- BIOMOD_EnsembleModeling(
modeling.output = POTR.mod,
chosen.models = 'all',
em.by= 'all',
eval.metric = c('TSS'),
eval.metric.quality.threshold = c(0.6),
prob.mean = T,
prob.cv = T,
prob.ci = T,
prob.ci.alpha = 0.05,
prob.median = T,
committee.averaging = T,
prob.mean.weight = T,
prob.mean.weight.decay = 'proportional')
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
! all models available will be included in ensemble.modeling
> Evaluation & Weighting methods summary :
TSS over 0.6
> mergedAlgo_mergedRun_mergedData ensemble modeling
! Models projections for whole zonation required...
> Projecting Populus.tremuloides_AllData_Full_GBM ...
> Projecting Populus.tremuloides_AllData_Full_ANN ...
> Projecting Populus.tremuloides_AllData_Full_RF ...
> Mean of probabilities...
Evaluating Model stuff...
> Coef of variation of probabilities...
Evaluating Model stuff...
> Confidence Interval...
Evaluating Model stuff...
Evaluating Model stuff...
> Median of probabilities...
Evaluating Model stuff...
> Committee averaging...
Evaluating Model stuff...
> Probabilities weighting mean...
original models scores = 0.637 0.615 0.869
final models weights = 0.3 0.29 0.41
Evaluating Model stuff...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
myBiomodEM
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 'BIOMOD.EnsembleModeling.out' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
sp.name : Populus.tremuloides
expl.var.names : Annual.Mean.Temperature Mean.Diurnal.Range Isothermality Temperature.Seasonality
Max.Temperature.of.Warmest.Month Min.Temperature.of.Coldest.Month Temperature.Annual.Range
Mean.Temperature.of.Wettest.Quarter Mean.Temperature.of.Driest.Quarter Mean.Temperature.of.Warmest.Quarter
Mean.Temperature.of.Coldest.Quarter Annual.Precipitation Precipitation.of.Wettest.Month
Precipitation.of.Driest.Month Precipitation.Seasonality Precipitation.of.Wettest.Quarter
Precipitation.of.Driest.Quarter Precipitation.of.Warmest.Quarter Precipitation.of.Coldest.Quarter
models computed:
Populus.tremuloides_EMmeanByTSS_mergedAlgo_mergedRun_mergedData, Populus.tremuloides_EMcvByTSS_mergedAlgo_mergedRun_mergedData, Populus.tremuloides_EMciInfByTSS_mergedAlgo_mergedRun_mergedData, Populus.tremuloides_EMciSupByTSS_mergedAlgo_mergedRun_mergedData, Populus.tremuloides_EMmedianByTSS_mergedAlgo_mergedRun_mergedData, Populus.tremuloides_EMcaByTSS_mergedAlgo_mergedRun_mergedData, Populus.tremuloides_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
get_evaluations(myBiomodEM)
$Populus.tremuloides_EMmeanByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA 0.542 310 73.585 89.493
TSS 0.745 136 98.679 75.593
ROC 0.922 140 98.491 76.043
$Populus.tremuloides_EMcvByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA NA NA NA NA
TSS NA NA NA NA
ROC NA NA NA NA
$Populus.tremuloides_EMciInfByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA 0.513 144.0 68.868 89.673
TSS 0.617 77.0 76.415 85.170
ROC 0.828 84.5 75.849 86.010
$Populus.tremuloides_EMciSupByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA 0.529 413.0 92.264 82.858
TSS 0.783 364.0 98.491 79.946
ROC 0.921 363.5 98.491 79.946
$Populus.tremuloides_EMmedianByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA 0.465 285.0 73.208 85.650
TSS 0.672 110.0 93.208 73.942
ROC 0.899 114.5 92.830 74.422
$Populus.tremuloides_EMcaByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA 0.602 828.0 88.868 87.872
TSS 0.767 828.0 88.868 87.872
ROC 0.919 833.5 88.868 87.872
$Populus.tremuloides_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA 0.568 304.0 78.491 89.282
TSS 0.763 160.0 98.491 77.634
ROC 0.929 161.5 98.491 77.844
myBiomodProj <- BIOMOD_Projection(modeling.output = POTR.mod,
new.env = stack(envStack_UT),
proj.name = 'current' ,
selected.models = 'all' , # will return separate projections for each model
binary.meth = 'TSS' ,
compress = 'xz' ,
clamping.mask = F,
output.format = '.grd' )
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Do Models Projections -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Building clamping mask
> Projecting Populus.tremuloides_AllData_Full_GBM ...
> Projecting Populus.tremuloides_AllData_Full_ANN ...
> Projecting Populus.tremuloides_AllData_Full_RF ...
> Building TSS binaries
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
myBiomodProj
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 'BIOMOD.projection.out' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Projection directory : Populus.tremuloides/current
sp.name : Populus.tremuloides
expl.var.names : Annual.Mean.Temperature Mean.Diurnal.Range Isothermality Temperature.Seasonality
Max.Temperature.of.Warmest.Month Min.Temperature.of.Coldest.Month Temperature.Annual.Range
Mean.Temperature.of.Wettest.Quarter Mean.Temperature.of.Driest.Quarter Mean.Temperature.of.Warmest.Quarter
Mean.Temperature.of.Coldest.Quarter Annual.Precipitation Precipitation.of.Wettest.Month
Precipitation.of.Driest.Month Precipitation.Seasonality Precipitation.of.Wettest.Quarter
Precipitation.of.Driest.Quarter Precipitation.of.Warmest.Quarter Precipitation.of.Coldest.Quarter
modeling id : 1555369258 ( Populus.tremuloides/Populus.tremuloides.1555369258.models.out )
models projected :
Populus.tremuloides_AllData_Full_GBM, Populus.tremuloides_AllData_Full_ANN, Populus.tremuloides_AllData_Full_RF
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
plot(myBiomodProj)
We can also plot these one at a time.
plot(myBiomodProj, str.grep = 'RF' )
If we want to plot the map on its own, we can extract the results to a new object.This is the output that you can save and manipulate.
rr dir.create(here(_output)) dir.create(here(_output/rasters))
writeRaster(myCurrentProj, filename = here(_output/rasters/biomod_out.grd), options = =BAND, overwrite = TRUE)
present_In <- stack(here(_output/rasters/biomod_out.grd)) present_In
I have included a supplement that will go into more detail, but you can also make maps using ggplot2 that I think are more straightforward to customize.
rr library(rasterVis) library(viridis)
RF_model <- present_In$Populus.tremuloides_AllData_Full_RF
p = gplot(RF_model) + geom_tile(aes(fill = value)) + scale_fill_viridis(na.value = ) + theme_bw() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + coord_fixed(ratio = 1.3) # sets the xy resolution to a constant value
p